High-performance on-module caching architectures for non-volatile dual in-line memory module (NVDIMM)

ABSTRACT

A high-performance on-module caching architecture for hybrid memory modules is provided. A hybrid memory module includes a cache controller, a first volatile memory coupled to the cache controller, a first multiplexing data buffer coupled to the first volatile memory and the cache controller, and a first non-volatile memory coupled to the first multiplexing data buffer and the cache controller, wherein the first multiplexing data buffer multiplexes data between the first volatile memory and the first non-volatile memory and wherein the cache controller enables a tag checking operation to occur in parallel with a data movement operation. The hybrid memory module includes a volatile memory tag unit coupled to the cache controller, wherein the volatile memory tag unit includes a line connection that allows the cache controller to store a plurality of tags in the volatile memory tag unit and retrieve the plurality of tags from the volatile memory tag unit.

BACKGROUND

Non-volatile dual in-line memory modules (NVDIMM) are hybrid memory modules that utilize volatile and non-volatile memory to perform memory operations for computer and server systems. Volatile memory in NVDIMM modules may be used for caching during normal computer operations, while non-volatile memory may be used as persistent back-up memory for data stored at the volatile memory. Accordingly, the contents of volatile memory are transferred to non-volatile memory and vice versa. In order to operate as an NVDIMM system, the hybrid modules typically utilize a centralized controller, centralized buffer, and separate dynamic random-access memory (DRAM) dies for tags and data. As a result, as data and commands are sent by the host processor to the NVDIMM system, they are generally processed and managed locally in the centralized controller. Since typical NVDIMM systems utilize the centralized controller to act as a controller for DRAM and perform cache operations and the centralized buffer to buffer external signals, typical NVDIMM systems tend to be inefficient in data transfer and memory recall. The inefficiency created by using typical NVDIMM systems results in NVDIMM devices that have high access latency and lower bandwidth.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items.

FIG. 1 is a block diagram illustrating a host based hybrid memory system according to various embodiments.

FIG. 2 is a block diagram illustrating a hybrid memory system according to various embodiments.

FIG. 3 is a block diagram illustrating a hybrid memory system according to various embodiments.

FIG. 4 is a block diagram illustrating a hybrid memory system according to various embodiments.

FIG. 5 is a block diagram illustrating a hybrid memory system according to various embodiments.

DETAILED DESCRIPTION

FIGS. 1-5 illustrate example systems of high-performance on-module caching architectures for non-volatile dual-line memory modules (NVDIMMs) according to various embodiments. In various embodiments, a hybrid memory module includes a cache controller, a first volatile memory coupled to the cache controller, a first multiplexing data buffer coupled to the first volatile memory and the cache controller, and a first non-volatile memory coupled to the first multiplexing data buffer and the cache controller, wherein, during a data movement operation, the first multiplexing data buffer multiplexes data between the first volatile memory and the first non-volatile memory and wherein the cache controller enables a tag checking operation to occur in parallel with the data movement operation. In various embodiments, the hybrid memory module includes a first volatile memory tag unit coupled to the cache controller, wherein the first volatile memory tag unit has a line connection dedicated to allowing the cache controller to store a plurality of tags in the first volatile memory tag unit and retrieve the plurality of tags from the first volatile memory tag unit. By placing caching and control operations at the NVDIMM, a processing system more efficiently executes memory access operations, thereby improving overall efficiency of the processing system.

FIG. 1 illustrates a host based hybrid memory system 100 according to various embodiments. In various embodiments, host based hybrid memory system 100 is a NVDIMM based system that combines volatile memory with non-volatile memory to perform memory operations for various computer based systems. In various embodiments, the NVDIMM system is implemented as an NVDIMM-P module. NVDIMM-P is a hybrid memory module that combines persistent DRAM with flash storage. In various embodiments, the NVDIMM-P module is considered a load-reduced DIMM memory module (LRDIMM) that provides memory controller access to non-volatile memory through a NVDIMM-P interface. For the NVDIMM-P configuration, DRAM is utilized as a cache and has access to flash memory. In various embodiments, the configurations described in hybrid memory module 121 of FIG. 1, hybrid memory module 221 of FIG. 2, hybrid memory module 321 of FIG. 3, hybrid memory module 421 of FIG. 4, and hybrid memory module 521 of FIG. 5 are NVDIMM-P configurations.

In various embodiments, host based hybrid memory system 100 includes a host 101, a system bus 110, and a hybrid memory module 121. Host 101 is coupled to hybrid memory module 121 via system bus 110. In various embodiments, host 101 includes a host processor 102 and a host memory controller 104. Host memory controller 104 is coupled to host processor 102. In various embodiments, the architecture of host 101, including host processor 102 and host memory controller 104 are based on, for example, ARM, Intel x86, MIPS, etc. In various embodiments, JEDEC standards are used for interface 142 between host 101 and system bus 110.

During operation of host based hybrid memory system 100, host memory controller 104 communicates with hybrid memory module 121 to perform normal, backup, and/or restore operations using interface 142, system bus 110, line 132, line 134, and line 136. In various embodiments, the term “line” and/or “lines” described herein are considered a plurality of lines (i.e., not a single line), buses or communication links that allow for the communication and/or transfer of data signals, control signals, and/or other types of signals within, for example, the prescribed systems. In various embodiments, line 132 is used as a data bus to transfer data to and from hybrid memory module 121. Line 136 is used as a data bus to transfer data to and from hybrid memory module 121. Line 134 is used to transfer command and address signals to and from hybrid memory module 121. During operation of host based hybrid memory system 100, host memory controller 104 may request and attain data from non-volatile memory located in hybrid memory module 121. In various embodiments, the data is eventually transferred to non-volatile memory located in hybrid memory module 121, but may be cached temporarily in volatile memory. In various embodiments, hybrid memory module 121 processes control signals (e.g., line 134) which may be defined as signals that allow hybrid memory module 121 to determine the locations of the data in volatile memory and return data to host 101. When data is available for use by host 101, hybrid memory module 121 sends a command to host 101 to notify host memory controller 104 that data is ready for retrieval by host 101. Thus, the configuration of the host based hybrid memory system 100 supports caching and/or control operations at the module, thus allowing for improved speed and access to the volatile and non-volatile memory. That is, in various embodiments, by locating the cache controller at the memory module, cache control operations are offloaded from the processor, thereby enhancing processing efficiency and speed.

FIG. 2 illustrates a hybrid memory system 200 according to various embodiments. Hybrid memory system 200 includes a hybrid memory module 221, an external parallel data bus 252, an external command-address bus 254, and an external parallel data bus 256. In various embodiments, external parallel bus 252 and external parallel data bus 256 may be considered a single external parallel data bus. The configuration described in hybrid memory module 221 is, for example, an NVDIMM-P configuration. Hybrid memory module 221 includes a non-volatile memory 202, a non-volatile memory 204, a non-volatile memory 206, a non-volatile memory 208, a volatile memory 212, a volatile memory 214, a volatile memory 218, a volatile memory 216, volatile memory tag unit 222, volatile memory tag unit 224, a multiplexing data buffer 232, a multiplexing data buffer 234, a multiplexing data buffer 236, a multiplexing data buffer 238, and a cache controller 242. In various embodiments, the number of non-volatile memories, volatile memories, volatile memory tag units, multiplexing data buffers, controllers, external parallel data buses, and external command-address buses may vary depending upon, for example, system design.

In various embodiments, the non-volatile memories are not limited to those described herein and may include, for example, other types of non-volatile storage. Examples of non-volatile memory may include solid-state memory, such as, for example, NAND flash storage, NOR flash storage, Phase Change Memory (PCM), and Spin-Transfer Torque (STT) Magnetic Random-Access Memory (MRAM). In various embodiments, line 271 and line 281 are data buses between non-volatile memory 202 and multiplexing data buffer 232, and non-volatile memory 208 and multiplexing data buffer 238, respectively. In various embodiments, line 272, line 274, line 282, and line 284 are command-address buses between non-volatile memory 202, non-volatile memory 204, non-volatile memory 208, non-volatile memory 206 and cache controller 242. In various embodiments, line 275 and line 285 are command-address-data buses between volatile memory tag unit 222 and cache controller 242, and volatile memory tag unit 224 and cache controller 242, respectively. In various embodiments, line 270 is a data bus between volatile memory 212 and multiplexing data buffer 232, line 278 is a data bus between volatile memory 214 and multiplexing data buffer 234, line 288 is a data bus between volatile memory 218 and multiplexing data buffer 236, and line 280 is a data bus between volatile memory 216 and multiplexing data buffer 238. Line 291 is a command bus between multiplexing data buffer 232 and cache controller 242. Line 279 is a command bus between multiplexing data buffer 234 and cache controller 242. Line 292 is a command bus between multiplexing data buffer 238 and cache controller 242. Line 289 is a command bus between multiplexing data buffer 236 and cache controller 242. Line 276 is a command-address bus between volatile memory 214 and cache controller 242. Line 277 is a command-address bus between volatile memory 212 and cache controller 242. Line 286 is a command-address bus between volatile memory 218 and cache controller 242. Line 287 is a command-address bus between volatile memory 216 and cache controller 242.

In various embodiments, non-volatile memory 202 is coupled to multiplexing data buffer 232 via line 271. Non-volatile memory 202 is coupled to cache controller 242 via line 272. In various embodiments, non-volatile memory 204 is coupled to multiplexing data buffer 234 via line 273. Non-volatile memory 204 is coupled to cache controller 242 via line 274. Non-volatile memory 208 is coupled to multiplexing data buffer 238 via line 281. Non-volatile memory 208 is coupled to cache controller 242 via line 282. Non-volatile memory 206 is coupled to multiplexing data buffer 236 via line 283. Non-volatile memory 206 is coupled to cache controller 242 via line 284.

The volatile memories described herein (e.g., volatile memory 212, volatile memory 214, volatile memory 216, volatile memory 218, etc.) are used to store data caches originating from host 101. An example of volatile memory may include, for example, Dynamic Random-Access Memory (DRAM). In various embodiments, volatile memory 212 is coupled to multiplexing data buffer 232 via line 270. Volatile memory 212 is coupled to cache controller 242 via line 277. Volatile memory 214 is coupled to multiplexing data buffer 234 via line 278. Volatile memory 214 is coupled to cache controller 242 via line 276. Volatile memory 216 is coupled to multiplexing data buffer 238 via line 280. Volatile memory 216 is coupled to cache controller 242 via line 287. Volatile memory 218 is coupled to multiplexing data buffer 236 via line 288. Volatile memory 218 is coupled to cache controller 242 via line 286.

In various embodiments, the volatile memory tag units described herein (e.g., volatile memory tag unit 222 and volatile memory tag unit 224) are, for example, volatile memory that is used to store DRAM cache tags. In various embodiments, volatile memory tag unit 222 is coupled to cache controller 242 via line 275. Volatile memory tag unit 224 is coupled to cache controller 242 via line 285. Line 271 and line 273 serve as a data connection between non-volatile memory 202, non-volatile memory 204 and multiplexing data buffer 232 and multiplexing data buffer 234 respectively. In various embodiments, line 283 and line 281 serve as a data connection between non-volatile memory 206 and non-volatile memory 208 and multiplexing data buffer 236 and multiplexing data buffer 238, respectively.

Multiplexing data buffers (e.g., multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, multiplexing data buffer 238) are used to buffer data provided from, for example, non-volatile memories 202, 204, 206, and/or 208, volatile memories 212, 214, 218, and/or 216, external parallel data buses 252 and/or 256, and cache controller 242. In various embodiments, multiplexing data buffer 232 is coupled to cache controller 242 via line 291. Multiplexing data buffer 232 is coupled to external parallel data bus 252 via lines 295. Multiplexing data buffer 234 is coupled to cache controller 242 via line 279. Multiplexing data buffer 234 is coupled to external parallel data bus 252 via lines 296. Multiplexing data buffer 238 is coupled to cache controller 242 via line 292. Multiplexing data buffer 232 is coupled to external parallel data bus 256 via lines 298. Multiplexing data buffer 236 is coupled to cache controller 242 via line 289. Multiplexing data buffer 236 is coupled to external parallel data bus 256 via lines 297. In various embodiments, line 291 and line 279 may serve as, for example, a command bus between the cache controller 242 and multiplexing data buffer 232 and/or multiplexing data buffer 234 (data buffer dies). Line 292 and line 289 may serve as, for example, a command bus between the cache controller 242 and multiplexing data buffer 236, and/or multiplexing data buffer 238 (data buffer dies). In various embodiments, the command bus is similar to, for example, the side-channel command (BCOM) bus in the DDR4 LRDIMM standard. Cache controller 242 is coupled to external command-address bus 254 via line 299.

In various embodiments, the types of dies used for multiplexing data buffers, non-volatile memories, volatile memories, volatile memory tag units, and controllers may vary based on the design of the hybrid memory module. In various embodiments, for example, volatile memory tag units are implemented in SRAM, embedded DRAM (eDRAM), or low-latency DRAM to provide faster access time. In various embodiments, volatile memory is implemented in high-density DRAM to provide large capacity. In addition, in various embodiments, volatile memory tags (DRAM tags) stored in volatile memory tag units and volatile memory data (DRAM data) stored in volatile memory are placed in separate dies or placed in the same dies. In various embodiments, DRAM tags and DRAM data are placed in separate DRAM dies to simplify connections between different controller and DRAM dies and to optimize each die for its target functionality.

During operation of hybrid memory module 221, host memory controller 104 provides external command signals and address signals to cache controller 242 via external command-address bus 254 and line 299. Cache controller 242 receives the external command signals and address signals via line 299 and uses the signals to control data movement in and out of the DIMM. Cache controller 242 buffers the command signals and address signals sent by host memory controller 104 and generates cache operation commands. Cache controller 242 processes the commands provided from the external command-address bus 254 and generates cache commands to provide to the non-volatile memory components, volatile memory components, multiplexing data buffer components, and volatile tag memory components of hybrid memory module 221.

In various embodiments, a read command with an address is received by cache controller 242 from host 101. Cache controller 242 generates a tag read command to read the contents of volatile memory tag unit 222 and/or volatile memory tag unit 224. In various embodiments, tags are located in a reserved portion of volatile memory instead of dedicated volatile memory tag unit 222 and/or volatile memory tag unit 224. In response, volatile memory tag unit 222 and/or volatile memory tag unit 224 provide tag data responses to cache controller 242. The tag data responses contain the contents of tags and are provided to cache controller 242. Cache controller 242 assesses the tags to determine whether the tag is a hit or a miss in volatile memory 212, volatile memory 214, volatile memory 218, and/or volatile memory 216. When a hit occurs, cache controller 242 becomes aware that data exists in volatile memory 212, volatile memory 214, volatile memory 218, and/or volatile memory 216 and fetches the corresponding data from the corresponding volatile memory. The data, if any, fetched from volatile memory 212 and/or volatile memory 214 is provided to multiplexing data buffer 232 and/or multiplexing data buffer 234. Simultaneously, the data, if any, fetched from volatile memory 218 and/or volatile memory 216 is provided to multiplexing data buffer 236 and/or multiplexing data buffer 238. When data is received by multiplexing data buffer 232 and/or multiplexing data buffer 234, and simultaneously by multiplexing data buffer 236 and/or multiplexing data buffer 238, cache controller 242 sends a command to host 101 to notify host 101 that data has been received by multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and/or multiplexing data buffer 238 and is ready for retrieval by host 101.

In various embodiments, when a miss occurs, cache controller 242 becomes aware that data exists in non-volatile memory 202, non-volatile memory 204, non-volatile memory 206, and/or non-volatile memory 208 and fetches the corresponding data from the corresponding non-volatile memory. The data, if any, fetched from non-volatile memory 202 and/or non-volatile memory 204 is sent directly to multiplexing data buffer 232 and/or multiplexing data buffer 234 via lines 271 and 273. Simultaneously, the data, if any, fetched from non-volatile memory 206 and/or non-volatile memory 208 is sent directly to multiplexing data buffer 236 and/or multiplexing data buffer 238 via lines 283 and 281. When data is received by multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and/or multiplexing data buffer 238, cache controller 242 sends a command to host 101 to notify host 101 that data has been received by multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and/or multiplexing data buffer 238, and is ready for retrieval by host 101.

In various embodiments, cache controller 242 sends commands to multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and/or multiplexing data buffer 238 to orchestrate copying of data from multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and/or multiplexing data buffer 238 to volatile memory 212, volatile memory 214, volatile memory 218, and/or volatile memory 216. In various embodiments, such an orchestration by cache controller 242 may be necessary since, for future accesses, a hit occurs. In various embodiments, the tag checking operations by the cache controller 242 and the data movement operations through data buffers described herein occur in parallel, allowing for an increase in memory bandwidth to retrieve and store data by hybrid memory module 221.

In various embodiments, during a tag checking operation, cache controller 242 retrieves and stores tags from volatile memory tag unit 222 and/or volatile memory tag unit 224. Cache controller 242 may have dedicated bus connections (line 275 and line 285) to volatile memory tag unit 222 and volatile memory tag unit 224 (tag dies) to retrieve and store the tags. When, for example, volatile memory tag unit 222 and volatile memory tag unit 224 (DRAM tag dies) are commodity DRAM dies, the connection interface may follow standard DDR interface with data, command, address, clock signals being transmitted via line 299 from external command-address bus 254. In various embodiments, the connection interface may comply with custom interface standards designed for accessing DRAM tag dies. In various embodiments, volatile memory tag unit 222 and volatile memory tag unit 224 (DRAM tag dies) may have connections to, for example, only cache controller 242. In various embodiments, multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and/or multiplexing data buffer 238 may not have direct access to volatile memory tag unit 222 and volatile memory tag unit 224 (e.g., DRAM tag dies) as depicted in FIG. 2.

Note that, in various embodiments, during a data movement operation, cache controller 242 directs data movement in and out of non-volatile memory 202, non-volatile memory 204, non-volatile memory 206, and/or non-volatile memory 208. In various embodiments, cache controller 242 manages data movements between the various components and are centralized or distributed. In various embodiments, cache controller 242, multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and multiplexing data buffer 238 are integrated into a single die.

In various embodiments, data communication between volatile memory 212, volatile memory 214, volatile memory 218, volatile memory 216 (DRAM data dies) and non-volatile memory 202, non-volatile memory 204, non-volatile memory 206, and/or non-volatile memory 208 (NVM dies) occurs through multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and/or multiplexing data buffer 238 (DB dies). In various embodiments, communication between the DRAM data dies and the non-volatile memory components are orchestrated by cache controller 242. In various embodiments, data movement from/to non-volatile memory 202, non-volatile memory 204, non-volatile memory 206, and/or non-volatile memory 208 bypasses volatile memory 212, volatile memory 214, volatile memory 218, and/or volatile memory 216 (DRAM dies), using, for example, line 271, line 273, line 283, line 281, respectively.

In various embodiments, during operation of hybrid memory module 221, multiple distributed buffers (e.g., multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and/or multiplexing data buffer 238) buffer and multiplex data signals between volatile memory, non-volatile memory, and/or host 101. That is, multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and/or multiplexing data buffer 238 operate as both data buffers and data multiplexers, multiplexing data between volatile memory 212, volatile memory 214, volatile memory 218, and/or volatile memory 216, and non-volatile memory 202, non-volatile memory 204, non-volatile memory 206, and/or non-volatile memory 208, and host 101. In various embodiments, having the multiplexing data buffers (e.g., multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and/or multiplexing data buffer 238) serve as multiplexers allows the data buffers to transfer data directly between the non-volatile memories and the volatile memories without moving data through cache controller 242. Thus, multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and/or multiplexing data buffer 238 serve not only as a bridge for data movement between volatile memory 212, volatile memory 214, volatile memory 218, and/or volatile memory 216, and host 101, but multiplexing data buffer 232, multiplexing data buffer 234, multiplexing data buffer 236, and/or multiplexing data buffer 238 also serve as a bridge for data movement between volatile memory 212, volatile memory 214, volatile memory 218, and/or volatile memory 216 and non-volatile memory 202, non-volatile memory 204, non-volatile memory 206, and/or non-volatile memory 208. The location and/or number of the multiplexing data buffers, non-volatile memories, volatile memories, and volatile memory tag units may vary according to various embodiments. In various embodiments, memory and NVM dies are structured atop an interposer instead of a DIMM printed circuit board (PCB). Thus, the hybrid memory system 200 supports improved interconnect power and higher bandwidth between stacked memory and NVM dies.

FIG. 3 illustrates hybrid memory system 300 according to various embodiments. Hybrid memory system 300 includes a hybrid memory module 321, an external parallel data bus 352, an external command-address bus 354, and an external parallel data bus 356. The configuration described in hybrid memory module 321 is, for example, an NVDIMM-P configuration. Hybrid memory module 321 includes a non-volatile memory 302, a non-volatile memory 304, a non-volatile memory 306, a non-volatile memory 308, a volatile memory 312, a volatile memory 314, a volatile memory 318, a volatile memory 316, volatile memory tag unit 322, volatile memory tag unit 324, an integrated control buffer 332, an integrated control buffer 338, and a controller 342. Integrated control buffer 332 includes cache operation controller logic 327. Integrated control buffer 338 includes cache operation controller logic 328.

In various embodiments, line 371 is a command-address-data bus between non-volatile memory 302 and integrated control buffer 332. Line 373 is a command-address-data bus between non-volatile memory 304 and integrated control buffer 332. Line 370 is a command-address-data bus between volatile memory 312 and integrated control buffer 332. Line 378 is a command-address-data bus between volatile memory 314 and integrated control buffer 332. Line 375 is a command-address-data bus between volatile memory tag unit 322 and integrated control buffer 332. In various embodiments, line 381 is a command-address-data bus between non-volatile memory 308 and integrated control buffer 338. Line 383 is a command-address-data bus between non-volatile memory 306 and integrated control buffer 338. Line 380 is a command-address-data bus between volatile memory 316 and integrated control buffer 338. Line 388 is a command-address-data bus between volatile memory 318 and integrated control buffer 338. Line 385 is a command-address-data bus between volatile memory tag unit 324 and integrated control buffer 338. In various embodiments, line 379 is a command bus between integrated control buffer 332 and controller 342. In various embodiments, line 389 is a command bus between integrated control buffer 338 and controller 342.

In various embodiments, non-volatile memory 302 is coupled to integrated control buffer 332 via line 371. In various embodiments, non-volatile memory 304 is coupled to integrated control buffer 332 via line 373. Non-volatile memory 306 is coupled to integrated control buffer 338 via line 383. Non-volatile memory 308 is coupled to integrated control buffer 338 via line 381. Volatile memory 312 is coupled to integrated control buffer 332 via line 370. Volatile memory 314 is coupled to integrated control buffer 332 via line 378. Volatile memory 316 is coupled to integrated control buffer 338 via line 380. Volatile memory 318 is coupled to integrated control buffer 338 via line 388. Volatile memory tag unit 322 is coupled to integrated control buffer 332 via line 375. Volatile memory tag unit 324 is coupled to integrated control buffer 338 via line 385.

Integrated control buffer 332 and integrated control buffer 338 are intelligent control buffers that are enhanced to perform cache operations. That is, integrated control buffer 332 and integrated control buffer 338 are integrated with cache integration logic (e.g., first cache integration logic 391 and second cache integration logic 392) to perform cache operations and data multiplexing between non-volatile memory, volatile memory, volatile memory tag units, the controller, and/or the host. Integrated control buffer 332 is coupled to controller 342 via line 379. Integrated control buffer 332 is coupled to external parallel data bus 352 via lines 395 and lines 396. Integrated control buffer 338 is coupled to controller 342 via line 389. Integrated control buffer 338 is coupled to external parallel data bus 356 via lines 397 and lines 398. Controller 342 is coupled to external command-address bus 354 via line 399.

In various embodiments, during operation of hybrid memory module 321, host memory controller 104 provides external command signals and address signals to controller 342 via external command-address bus 354 and line 399. Controller 342 receives the external command signals and address signals from external command-address bus 354 and uses the signals to control data movement in and out of the DIMM. Controller 342 buffers the command signals and address signals sent by host memory controller 104 and generates cache operation commands for use by integrated control buffer 332 and integrated control buffer 338.

Controller 342 provides the cache operation commands to integrated control buffer 332 and integrated control buffer 338 using line 379 and line 389, respectively. Integrated control buffer 332 and integrated control buffer 338 receive the cache operation commands. Integrated control buffer 332 and integrated control buffer 338 process the commands received from controller 342 and generate appropriate commands to access the non-volatile memory components, volatile memory components, data buffer components, and volatile tag memory components of hybrid memory module 321. Cache operations performed by integrated control buffer 338 and integrated control buffer 332, may include, for example, tag check operations and cache line replacement operations.

For example, during a tag check operation, a tag check command is first received by integrated control buffer 332 and/or integrated control buffer 338. Integrated control buffer 332 and/or integrated control buffer 338 generate a read command to read the contents of volatile memory tag unit 322 and/or volatile memory tag unit 324. In response, volatile memory tag unit 322 and/or volatile memory tag unit 324 provide tag data responses to integrated control buffer 332 and/or integrated control buffer 338. The tag data responses contain the contents of tags of volatile memory tag unit 322 and/or volatile memory tag unit 324 and are provided to integrated control buffer 332 and/or integrated control buffer 338. Integrated control buffer 332 and/or integrated control buffer 338 check the tags to determine whether each tag is a hit or a miss in volatile memory 312, volatile memory 314, volatile memory 318, and/or volatile memory 316. After integrated control buffer 332 and/or integrated control buffer 338 check the tags, signals are sent back to controller 342 to indicate a hit/miss signal for each way of the cache row accessed. Controller 342 issues another cache operation command to integrated control buffer 332 and/or integrated control buffer 338.

When a miss occurs, controller 342 sends a command to integrated control buffers to (integrated control buffer 332 and/or integrated control buffer 338) instruct them to read data from non-volatile memory. Controller 342 also sends a command to integrated control buffers to copy data to volatile memories and orchestrate a cacheline replacement. When a hit occurs, controller 342 sends a command to integrated control buffers to instruct the integrated control buffers to read data from volatile memories. The integrated control buffers are instructed by controller 342 and do not fetch data on their own. When data is received by integrated control buffer 332 and integrated control buffer 338, controller 342 sends a command to host 101 to notify host 101 that data has been received by integrated control buffer 332 and integrated control buffer 338 and is ready for retrieval by host 101.

In various embodiments, a tag check command is sent by controller 342 to integrated control buffer 332 and/or integrated control buffer 338 that invokes logic in integrated control buffer 332 and/or integrated control buffer 338 to read out a designated tag address and perform a parallel tag check. The tag checking operations conducted by integrated control buffer 332 and/or integrated control buffer 338 may occur in parallel with data movement operations since, for example, tag and data may use separate connections (e.g., tag checking address XX0 in parallel with reading data from address XX1). In various embodiments, controller 342 may send a single cache operation command or multiple cache operations at any given time. For example, in various embodiments, when or if cache tags are distributed among separate volatile memory tag dies (no tag striping), parallel data buffer accesses to volatile memory tag dies may occur. Controller 342 may instruct integrated control buffer 338 and integrated control buffer 332 to access different tags in parallel.

In various embodiments, during operation of hybrid memory module 321, integrated control buffer 332 and/or integrated control buffer 338 may generate output signals that are provided from integrated control buffer 332 and/or integrated control buffer 338 to controller 342. The output signals may include, for example, hit/miss signals that are provided to controller 342 via line 379 and line 389. In various embodiments, controller 342 may issue a cache operation command to integrated control buffer 332 and/or integrated control buffer 338 to move data between a first non-volatile memory and volatile memory (e.g., NVM-DRAM) pair to, for example, load a cache line from volatile memory to non-volatile memory in case of miss, or write back a dirty cache line to non-volatile memory on eviction, or read a cache line from DRAM in case of a hit, etc. Integrated control buffer 332 and integrated control buffer 338 may also be instructed by controller 342 to perform operations such as data shuffling, encryption, and memory processing.

In various embodiments, integrated control buffer 332 and/or integrated control buffer 338 may support a plurality of different cache operation commands, such as, for example, loading critical word first, invalidating a cache line, and cache-bypassed writes (non-cached writes). In various embodiments, the commands may be, for example, required for any on-DIMM cache. In various embodiments, the cache operation controller logic 327 and cache operation controller logic 328 are placed inside integrated control buffer 332 and/or integrated control buffer 338 to perform cache operations. In addition, each integrated control buffer may be, for example, in control of a volatile memory and a non-volatile memory pair (DRAM and NAND) of chips or channels and each integrated control buffer may be instructed by the controller 342 to perform cache operations independently from other integrated control buffers. In various embodiments, integrated control buffer 332 and/or integrated control buffer 338 may, for example, buffer data and perform cache operations instructed by controller 342. Integrated control buffer 332 and/or integrated control buffer 338 may perform data multiplexing among various cache operations and/or different die as instructed by controller 342.

FIG. 4 illustrates hybrid memory system 400 according to various embodiments. Hybrid memory system 400 includes a hybrid memory module 421, an external parallel data bus 452, an external command-address bus 454, and an external parallel data bus 456. The configuration described in hybrid memory module 421 is, for example, an NVDIMM-P configuration. Hybrid memory module 421 includes a non-volatile memory 402, a non-volatile memory 404, a non-volatile memory 406, a non-volatile memory 408, a volatile memory 412, a volatile memory 414, a volatile memory 418, a volatile memory 416, volatile memory tag unit 422, volatile memory tag unit 424, a control data buffer 432, and a control data buffer 438.

In various embodiments, line 471 is a command-address-data bus between non-volatile memory 402 and control data buffer 432. In various embodiments, line 473 is a command-address-data bus between non-volatile memory 404 and control data buffer 432. In various embodiments, line 483 is a command-address-data bus between non-volatile memory 406 and control data buffer 438. In various embodiments, line 481 is a command-address-data bus between non-volatile memory 408 and control data buffer 438. In various embodiments, line 470 is a command-address-data bus between volatile memory 412 and control data buffer 432. In various embodiments, line 478 is a command-address-data bus between volatile memory 414 and control data buffer 432. Line 488 is a command-address-data bus between volatile memory 418 and control data buffer 438. Line 480 is a command-address-data bus between volatile memory 416 and control data buffer 438. In various embodiments, line 475 is a command-address-data bus between volatile memory tag unit 422 and control data buffer 432. Line 485 is a command-address-data bus between volatile memory tag unit 424 and control data buffer 438.

In various embodiments, non-volatile memory 402 is coupled to control data buffer 432 via line 471. Non-volatile memory 404 is coupled to control data buffer 432 via line 473. Non-volatile memory 406 is coupled to control data buffer 438 via line 483. Non-volatile memory 408 is coupled to control data buffer 438 via line 481. Volatile memory 412 is coupled to control data buffer 432 via line 470. Volatile memory 414 is coupled to control data buffer 432 via line 478. Volatile memory 416 is coupled to control data buffer 438 via line 480. Volatile memory 418 is coupled to control data buffer 438 via line 488. Volatile memory tag unit 422 is coupled to control data buffer 432 via line 475. Volatile memory tag unit 424 is coupled to control data buffer 438 via line 485.

In various embodiments, control data buffer 432 and control data buffer 438 not only buffer data, but also have cache controller functionality integrated into control data buffer 432 and control data buffer 438. Thus, control data buffer 432 and control data buffer 438 are enhanced to perform cache operations such as, for example, tag checking operations. Control data buffer 432 is coupled to external command-address bus 454 via line 479. Control data buffer 432 is coupled to external parallel data bus 452 via lines 495 and lines 496. Control data buffer 438 is coupled to external command-address bus 454 via line 479. Control data buffer 438 is coupled to external parallel data bus 456 via lines 497 and lines 498.

In various embodiments, during operations of hybrid memory module 421, host memory controller 104 provides external command signals and address signals to control data buffer 432 and/or control data buffer 438 via external command-address bus 454 and line 479. Control data buffer 432 and/or control data buffer 438 receive the external command signals and address signals from external command-address bus 454 and use the signals to control data movement in and out of hybrid memory module 421. Control data buffer 432 and/or control data buffer 438 buffer the command signals and address signals sent by host memory controller 104 and generate cache operation commands for use within hybrid memory module 421.

Control data buffer 432 and/or control data buffer 438 process the commands received from external command-address bus 454 to manage data communication between non-volatile memory components, volatile memory components, data buffer components, and/or volatile tag memory components of hybrid memory module 421. Cache operations performed by control data buffer 432 and/or control data buffer 438, may include, for example, tag check commands and replacement operation commands.

During a read operation triggered by the host memory controller 104, control data buffer 432 and/or control data buffer 438 generate a read command to read the contents of volatile memory tag unit 422 and/or volatile memory tag unit 424. In response, volatile memory tag unit 422 and/or volatile memory tag unit 424 provide tag data responses to control data buffer 432 and/or control data buffer 438. The tag data responses contain the contents of tags and are provided to control data buffer 432 and/or control data buffer 438. Control data buffer 432 and/or control data buffer 438 assess the tags to determine whether each tag is a hit or a miss in volatile memory 412, volatile memory 414, volatile memory 418, and/or volatile memory 416. When a hit occurs, control data buffer 432 and/or control data buffer 438 become aware that data exists in volatile memory 412, volatile memory 414, volatile memory 418, and/or volatile memory 416 and fetch the corresponding data from the corresponding volatile memory. The data, if any, fetched from volatile memory 412 and/or volatile memory 414 is provided to control data buffer 432. The data, if any, fetched from volatile memory 418 and/or volatile memory 416 is provided to control data buffer 438. When data is received by control data buffer 432, control data buffer 432 sends a command to host 101 to notify host 101 that data has been received by control data buffer 432 and is ready for retrieval by host 101. When data is received by control data buffer 438, control data buffer 438 sends a command to host 101 to notify host 101 that data has been received by control data buffer 438 and is ready for retrieval by host 101.

In various embodiments, hybrid memory module 421 may include one, multiple or a plurality of control data buffers. For example, hybrid memory module 421 may include one, two or more control data buffers. In various embodiments, control data buffer 432 and control data buffer 438 read designated tags and conduct a parallel tag check for multiple cache ways. Control data buffer 432 and control data buffer 438 instruct independent NVM-DRAM data movement, DRAM-host data movement, or even NVM-host data movement in case DRAM is bypassed.

In various embodiments, control data buffer 432 and control data buffer 438 retrieve tags from volatile memory tag unit 422 and volatile memory tag unit 424 (e.g., DRAM tag dies) via dedicated bus connections, line 475 and line 485, respectively. In various embodiments, control data buffer 432 and control data buffer 438 perform tag access in parallel with data access and have dedicated bus connections from/to volatile memory and non-volatile memory (e.g., DRAM and NVM dies). For example, control data buffer 432 may have a dedicated bus line 470 to volatile memory 412 and a dedicated bus line 478 to volatile memory 414. Control data buffer 438 may have a dedicated bus line 488 to volatile memory 418 and a dedicated bus line 480 to volatile memory 416. Control data buffer 432 may have a dedicated bus line 471 to non-volatile memory 402 and a dedicated bus line 473 to non-volatile memory 404. Control data buffer 438 may have a dedicated bus line 483 to non-volatile memory 406 and a dedicated bus line 481 to non-volatile memory 408. Control data buffer 432 and control data buffer 438 may also perform operations such as, for example, data shuffling, encryption, checksum calculation, and processing in memory.

In various embodiments, control data buffer 432 and control data buffer 438 may form a separate data channel between host 101 and hybrid memory module 421. Each channel may perform operations independently and in parallel with other channels. In various embodiments, within a channel, tag checking operations are performed in parallel with DRAM access, NVM access, or data transfer operations between host 101 and hybrid memory module 421. In various embodiments, for the distributed controller organization of control data buffer 432 and control data buffer 438, only dies belonging to each channel are interconnected. In various embodiments, control data buffer 432 and control data buffer 438 may have an associated command address bus within hybrid memory module 421 for providing command and address signals.

FIG. 5 illustrates hybrid memory system 500 according to various embodiments. Hybrid memory system 500 includes a hybrid memory module 521, an external parallel data bus 552, an external command-address bus 554, and an external parallel data bus 556. FIG. 5 illustrates hybrid memory module 521 according to various embodiments. The configuration described in hybrid memory module 521 is, for example, an NVDIMM-P configuration. Hybrid memory module 521 includes a non-volatile memory 502, a non-volatile memory 504, a non-volatile memory 506, a non-volatile memory 508, and a stacked volatile memory 512. In various embodiments, non-volatile memory 502 is referred to as first non-volatile memory 502, non-volatile memory 504 is referred to as second non-volatile memory 504, non-volatile memory 506 is referred to as third non-volatile memory 506, and non-volatile memory 508 is referred to as fourth non-volatile memory 508.

Stacked volatile memory 512 is depicted in FIG. 5 as a 3-dimensial block including a first layer 577 and a second layer 578. In various embodiments, stacked volatile memory 512 may include more than two layers. In various embodiments, first layer 577 may include a cache controller 591 and a volatile memory 593. Volatile memory 593 may include both volatile memory data and volatile memory tags or only volatile memory data. Second layer 578 may include a volatile memory tag unit 592 and a data buffer 594. In various embodiments, volatile memory tag unit 592 is a segment of volatile memory used to store tags and/or is located any layer of stacked volatile memory 512 having the prescribed volatile memory. In various embodiments, cache controller 591, volatile memory tag unit 592, data buffer 594, and volatile memory 593 are on any of the prescribed layers. The number of layers in stacked volatile memory 512 may vary (e.g., increase or decrease) depending on, for example, system design requirements of hybrid memory module 521.

In various embodiments, line 571 is a command-address-data bus between non-volatile memory 502 and stacked volatile memory 512. Line 573 is a command-address-data bus between non-volatile memory 504 and stacked volatile memory 512. In various embodiments, line 581 is a command-address-data bus between non-volatile memory 508 and stacked volatile memory 512. Line 583 is a command-address-data bus between non-volatile memory 506 and stacked volatile memory 512. Non-volatile memory 502 is coupled to stacked volatile memory 512 via line 571. Non-volatile memory 504 is coupled to stacked volatile memory 512 via line 573. Non-volatile memory 508 is coupled to stacked volatile memory 512 via line 581. Non-volatile memory 506 is coupled to stacked volatile memory 512 via line 583. Stacked volatile memory 512 is coupled to external command-address bus 554 via line 599. Stacked volatile memory 512 is coupled to external parallel data bus 552 via lines 595 and lines 596. Stacked volatile memory 512 is coupled to external parallel data bus 556 via lines 597.

In various embodiments, stacked volatile memory 512 of hybrid memory module 521 may include a single stacked volatile memory or multiple stacked volatile memories (e.g., High-Bandwidth Memory (HBM and HBM2)) and multiple non-volatile memory dies. HBM and HBM2 are first and second generation RAM interfaces for 3D stacked DRAM adopted by JEDEC as industry standards. Cache controller 591 (e.g., DRAM cache controller) is placed or located in the base logic die of stacked volatile memory 512 or on a separate die. In various embodiments, DRAM tags and/or data are placed in separate stacked volatile memories or be integrated into the same stacked volatile memory.

During operation of hybrid memory module 521, host 101 routes command, address, clock, and data signals to cache controller 591 of stacked volatile memory 512 via external command-address bus 554 and line 599. In various embodiments, first layer 577, is, for example, a logic layer of stacked volatile memory 512, is used as cache controller 591. Cache controller 591 receives the command, address, clock, and data signals and acts as a centralized controller and buffer of the command, address, and data signals. Cache controller 591 processes the commands provided from the external command-address bus 554 and generates cache commands to provide to the non-volatile memory components, volatile memory components, data buffer components, and volatile tag memory components of hybrid memory module 521.

During a read operation request sent by host 101 and received by cache controller 591, cache controller 591 generates a read command to read or access the contents of the volatile memory tags located in volatile memory tag unit 592 of stacked volatile memory 512. In response, volatile memory tag unit 592 provides tag data responses to cache controller 591. The tag data responses may contain, for example, the contents of tags and are provided to cache controller 591. Cache controller 591 assesses the tags to determine whether a tag is a hit or a miss in volatile memory 593 of stacked volatile memory 512. When a hit occurs, cache controller 591 becomes aware that data exists in volatile memory 593 and fetches or accesses the corresponding data from volatile memory 593. The data, if any, fetched from volatile memory 593 is provided to and buffered in a buffer portion of cache controller 591 (e.g., data buffer 594). When data is received by cache controller 591, cache controller 591 sends a command to host 101 to notify host 101 that data has been received by cache controller 591 and is ready for retrieval by host 101.

When a miss occurs, cache controller 591 becomes aware that data exists in non-volatile memory 502, non-volatile memory 504, non-volatile memory 506, and/or non-volatile memory 508 and fetches or accesses the corresponding data from the corresponding non-volatile memory. The data from non-volatile memory 502, non-volatile memory 504, non-volatile memory 506, and/or non-volatile memory 508 is provided to the buffer portion of cache controller 591 (e.g., data buffer 594). When data is received by the buffer portion of cache controller 591, cache controller 591 sends a command to host 101 to notify host 101 that data has been received by the buffer portion of cache controller 591 and is ready for retrieval.

In various embodiments, access to data located in volatile memory 593 of stacked volatile memory 512 and access to a tag located in volatile memory tag unit 592 is performed in parallel with access to first non-volatile memory 502 and second non-volatile memory 504. In various embodiments, access is referred to as a first, second, or third access depending on, for example, the sequence and/or location of accesses conducted by the corresponding element.

In various embodiments, cache operations performed by cache controller 591 may include replacement operation commands. Note that, in various embodiments, pin requirements of stacked volatile memory 512 may resemble the pin requirements of the memory buffer (MB) in DRR3 LRDIMM since stacked volatile memory 512 has command, address, clock, and data connections to non-volatile memory components. In various embodiments, data communication between host 101 and hybrid memory module 521 occurs through stacked volatile memory 512 and is managed by cache controller 591 implemented in a logic base die of the stacked volatile memory 512. The base logic die of stacked volatile memory 512 may contain DRAM cache controller functionality. In various embodiments, DRAM tags and data are placed in, for example, separate banks, or separate layers (dies) within the stack. In various embodiments, for example, in HBM 2, each die (layer) may have two/four individually-controlled channels and each channel may be 128 bits wide. In various embodiments, a dedicated layer or two are assigned for storing tags. In various embodiments, as the size of tags and metadata are significantly less than the size of data (e.g., 16B vs. 64B for each cache line), the channel width and internal data lines of tag layers are narrower than the channel width of DRAM layers of stacked volatile memory 512.

A benefit of the organization illustrated in FIG. 5 is that stacked volatile memory 512 includes controller and buffer chip functionality and may contain both cached data and the associated tags. Thus, as depicted in FIG. 5, there may not be a need for separate buffer dies or tag dies. Thus, in various embodiments, the controller and buffer chip functionalities are integrated within a single die, reducing the bus connections within the module, reducing the number of dies within the module, and reducing the dimension of the module. Another benefit is that, in various embodiments, volatile memory data and tag accesses are performed in parallel with accesses to non-volatile memory.

In various embodiments, a single piece of data (i.e., a cache line) may be striped across the NVM dies. Access to stacked volatile memory data may include, for example, access to data located in data buffers, volatile memory, and/or non-volatile memory. In various embodiments, the cache controller of stacked volatile memory 512 accesses non-volatile memory 502, non-volatile memory 504, non-volatile memory 506, and/or non-volatile memory 508 to fetch a single cache line. In various embodiments, cache lines are interleaved over non-volatile memory 502, non-volatile memory 504, non-volatile memory 506, and/or non-volatile memory 508. For example, in various embodiments, each cache line may be interleaved over multiple NVM dies or each cache line is entirely placed in a single NVM die. In various embodiments, cache controller 591 of stacked volatile memory 512 orchestrates multiple data accesses over NVM dies in parallel, which improves throughput as accesses are distributed across NVM dies.

In various embodiments of the hybrid memory module, the tag checking operation occurs from the cache controller to the first volatile memory tag unit and the data movement operation occurs from at least one of the first volatile memory to the first data buffer and the first non-volatile memory to the first data buffer. In various embodiments, first data buffer refers to a first multiplexing data buffer.

In various embodiments, the hybrid memory module includes a second volatile memory coupled to the cache controller, a second data buffer coupled to the second volatile memory, a second non-volatile memory coupled to the second data buffer and the cache controller, wherein the second multiplexing data buffer multiplexes data between the second volatile memory and the second non-volatile memory. In various embodiments, second data buffer refers to a second multiplexing data buffer.

In various embodiments of the hybrid memory module, the cache controller buffers a plurality of external command signals and a plurality of address signals provided from a host. In various embodiments of the hybrid memory module, the first volatile memory is DRAM and the first non-volatile memory is at least one of NAND flash storage, NOR flash storage, phase change memory, and spin-transfer torque magnetic random-access memory.

In various embodiments, a hybrid memory module includes a first non-volatile memory, a first integrated control buffer coupled to the first non-volatile memory, and a first volatile memory and a first volatile memory tag unit coupled to the first integrated control buffer, wherein the first integrated control buffer is integrated with first cache integration logic to perform cache operations and the first integrated control buffer performs data multiplexing between at least two of the first non-volatile memory, the first volatile memory, and the first volatile memory tag unit. In various embodiments of the hybrid memory module, the cache operations include at least one of reading a designated tag address from the first volatile memory tag unit and performing a tag check operation.

In various embodiments of the hybrid memory module, the cache operations include at least one of instructing the first non-volatile memory and the first volatile memory to load a cache line when a miss operation occurs, write back to the cache line when an eviction occurs, or read the cache line when the miss operation occurs. In various embodiments of the hybrid memory module, the tag check operation occurs in parallel with a data movement operation. In various embodiments of the hybrid memory module, the first integrated control buffer provides data to an external parallel data bus.

In various embodiments, the hybrid memory module includes a second non-volatile memory and a second volatile memory coupled to a second integrated control buffer, wherein the second integrated control buffer is integrated with second cache integration logic to perform cache operations and the second integrated control buffer performs data multiplexing between at least two of the second non-volatile memory, the second volatile memory, and a second volatile memory tag unit.

In various embodiments of the hybrid memory module, the first volatile memory and the second volatile memory are DRAM and the first non-volatile memory and the second non-volatile memory are at least one of NAND flash storage, NOR flash storage, phase change memory, and spin-transfer torque magnetic random-access memory.

In various embodiments, a hybrid memory module includes a first non-volatile memory, and a stacked volatile memory coupled to the first non-volatile memory, the stacked volatile memory having a volatile memory tag unit, a data buffer, and a cache controller, wherein, in order to access stacked volatile memory data, a first access to data located in a volatile memory unit of the stacked volatile memory and a first access to a tag located in the volatile memory tag unit are performed in parallel with a first access to the first non-volatile memory.

In various embodiments of the hybrid memory module, the cache controller is located in a first layer of the stacked volatile memory and the volatile memory tag unit is located in a second layer of the stacked volatile memory. In various embodiments of the hybrid memory module, the cache controller is located in a base logic die of the stacked volatile memory.

In various embodiments of the hybrid memory module, data communication between a host coupled to the hybrid memory module and the stacked volatile memory occurs through the stacked volatile memory. In various embodiments of the hybrid memory module, data communication between a host coupled to the hybrid memory module and the stacked volatile memory is managed by the cache controller.

In various embodiments of the hybrid memory module, the data communication includes at least one of a command signal, a clock signal, and a data signal. In various embodiments of the hybrid memory module, the stacked volatile memory is stacked DRAM memory and the first non-volatile memory is at least one of NAND flash storage, NOR flash storage, phase change memory, and spin-transfer torque magnetic random-access memory.

Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device may not be required, and that one or more further activities may be performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed.

Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.

Benefits, other advantages, and solutions to problems have been described above with regards to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims. Moreover, the particular embodiments disclosed above are illustrative only, as the disclosed subject matter may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. No limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope of the disclosed subject matter. Accordingly, the protection sought herein is as set forth in the claims below. 

What is claimed is:
 1. A hybrid memory module, comprising: a first non-volatile memory; a first integrated control buffer coupled directly to the first non-volatile memory, wherein the first integrated control buffer operates as both a data buffer and a multiplexer; and a first volatile memory and a first volatile memory tag unit coupled directly to the first integrated control buffer, wherein the first integrated control buffer is integrated with first cache integration logic to perform cache operations and the first integrated control buffer performs data multiplexing between at least two of the first non-volatile memory, the first volatile memory, and the first volatile memory tag unit, wherein the cache operations include at least one of instructing the first non-volatile memory and the first volatile memory to load a cache line when a miss operation occurs, write back to the cache line when an eviction occurs, or read the cache line when a hit operation occurs.
 2. The hybrid memory module of claim 1, wherein the cache operations include at least one of reading a designated tag address from the first volatile memory tag unit and performing a tag check operation.
 3. The hybrid memory module of claim 2, wherein the tag check operation occurs in parallel with a data movement operation.
 4. The hybrid memory module of claim 1, wherein the first integrated control buffer provides data to an external parallel data bus.
 5. The hybrid memory module of claim 1, further comprising a second non-volatile memory and a second volatile memory coupled to a second integrated control buffer, wherein the second integrated control buffer is integrated with second cache integration logic to perform cache operations and the second integrated control buffer performs data multiplexing between at least two of the second non-volatile memory, the second volatile memory, and a second volatile memory tag unit.
 6. The hybrid memory module claim 5, wherein the first volatile memory and the second volatile memory are DRAM and the first non-volatile memory and the second non-volatile memory are at least one of NAND flash storage, NOR flash storage, phase change memory, and spin-transfer torque magnetic random-access memory.
 7. A hybrid memory module, comprising: a first non-volatile memory; and a stacked volatile memory coupled to the first non-volatile memory, the stacked volatile memory having a volatile memory tag unit, a multiplexing data buffer, and a cache controller, wherein the multiplexing data buffer is coupled directly to the first non-volatile memory and the stacked volatile memory and operates as both a data buffer and a multiplexer and, in order to access stacked volatile memory data, a first access to data located in a volatile memory unit of the stacked volatile memory and a first access to a tag located in the volatile memory tag unit are performed in parallel with a first access to the first non-volatile memory, wherein the cache controller is located in a first layer of the stacked volatile memory and the volatile memory tag unit is located in a second layer of the stacked volatile memory.
 8. A hybrid memory module, comprising: a first non-volatile memory; and a stacked volatile memory coupled to the first non-volatile memory, the stacked volatile memory having a volatile memory tag unit, a multiplexing data buffer, and a cache controller, wherein the multiplexing data buffer is coupled directly to the first non-volatile memory and the stacked volatile memory and operates as both a data buffer and a multiplexer and, in order to access stacked volatile memory data, a first access to data located in a volatile memory unit of the stacked volatile memory and a first access to a tag located in the volatile memory tag unit are performed in parallel with a first access to the first non-volatile memory, wherein the cache controller is located in a base logic die of the stacked volatile memory.
 9. The hybrid memory module of claim 8, wherein data communication between a host coupled to the hybrid memory module and the stacked volatile memory is managed by the cache controller.
 10. The hybrid memory module of claim 8, wherein the data communication includes at least one of a command signal, a clock signal, and a data signal.
 11. The hybrid memory module of claim 8, wherein the stacked volatile memory is stacked DRAM memory and the first non-volatile memory is at least one of NAND flash storage, NOR flash storage, phase change memory, and spin-transfer torque magnetic random-access memory.
 12. A hybrid memory module, comprising: a first non-volatile memory; and a stacked volatile memory coupled to the first non-volatile memory, the stacked volatile memory having a volatile memory tag unit, a multiplexing data buffer, and a cache controller, wherein the multiplexing data buffer is coupled directly to the first non-volatile memory and the stacked volatile memory and operates as both a data buffer and a multiplexer and, in order to access stacked volatile memory data, a first access to data located in a volatile memory unit of the stacked volatile memory and a first access to a tag located in the volatile memory tag unit are performed in parallel with a first access to the first non-volatile memory, wherein data communication between a host coupled to the hybrid memory module and the stacked volatile memory occurs through the stacked volatile memory. 